In today’s world, stock prices are one of the most crucial parts of a company's finances. The movement of stock prices is influenced by various factors, both internal and external. In this project, some companies will be selected and their stock prices will be analyzed in a span of two years. The analysis encapsulates the changes in stock prices, the reasons of changes and the observation of outliers. The methods used are time series analysis, box plot, 3-Sigma technique. Time series analysis is used for visualizing data and observing the changes in stock prices. Box plot and 3-Sigma technique are used for identifying outliers.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import gzip
import shutil
input_file = 'all_ticks_wide.csv.gz'
output_file = 'data.csv' # Specify the desired name for the output CSV file
with gzip.open(input_file, 'rb') as f_in:
with open(output_file, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
df = pd.read_csv("data.csv")
df.head()
| timestamp | AEFES | AKBNK | AKSA | AKSEN | ALARK | ALBRK | ANACM | ARCLK | ASELS | ... | TTKOM | TUKAS | TUPRS | USAK | VAKBN | VESTL | YATAS | YKBNK | YUNSA | ZOREN | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2012-09-17T06:45:00Z | 22.3978 | 5.2084 | 1.7102 | 3.87 | 1.4683 | 1.1356 | 1.0634 | 6.9909 | 2.9948 | ... | 4.2639 | 0.96 | 29.8072 | 1.0382 | 3.8620 | 1.90 | 0.4172 | 2.5438 | 2.2619 | 0.7789 |
| 1 | 2012-09-17T07:00:00Z | 22.3978 | 5.1938 | 1.7066 | 3.86 | 1.4574 | 1.1275 | 1.0634 | 6.9259 | 2.9948 | ... | 4.2521 | 0.96 | 29.7393 | 1.0382 | 3.8529 | 1.90 | 0.4229 | 2.5266 | 2.2462 | 0.7789 |
| 2 | 2012-09-17T07:15:00Z | 22.3978 | 5.2084 | 1.7102 | NaN | 1.4610 | 1.1356 | 1.0679 | 6.9909 | 2.9855 | ... | 4.2521 | 0.97 | 29.6716 | 1.0463 | 3.8436 | 1.91 | 0.4229 | 2.5266 | 2.2566 | 0.7789 |
| 3 | 2012-09-17T07:30:00Z | 22.3978 | 5.1938 | 1.7102 | 3.86 | 1.4537 | 1.1275 | 1.0679 | 6.9584 | 2.9855 | ... | 4.2521 | 0.97 | 29.7393 | 1.0382 | 3.8529 | 1.91 | 0.4286 | 2.5324 | 2.2619 | 0.7860 |
| 4 | 2012-09-17T07:45:00Z | 22.5649 | 5.2084 | 1.7102 | 3.87 | 1.4574 | 1.1356 | 1.0725 | 6.9909 | 2.9760 | ... | 4.2521 | 0.97 | 29.8072 | 1.0382 | 3.8620 | 1.90 | 0.4286 | 2.5324 | 2.2619 | 0.7789 |
5 rows × 61 columns
# Setting index to timestamp to have a proper format for time series data
df["timestamp"] = pd.to_datetime(df["timestamp"])
df = df.set_index("timestamp")
# Choosing the columns to analyze
df = df[["GARAN", "YKBNK", "THYAO", "PETKM", "TCELL", "TTKOM"]]
# Define the date range for analysis
start_date = '2015-01-01'
end_date = '2017-01-01'
# Slice the DataFrame based on the date range
ts = df.loc[start_date:end_date]
# Plotting individual time series for each stock
for column in ts.columns:
ts[column].plot(figsize=(12, 5))
plt.title(f'Time Series Plot for {column}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.show()
print(f"Number of NaN values for each column: \n{ts.isna().sum()}")
proportion = ts.isna().sum().sort_values(ascending=False)[0] / ts.shape[0]
print(f"Proportion of the NaN values in the whole data in the column which has max NaN values: {proportion}")
# Since the proportion is small enough, dropping the NaN values will not cause a major problem.
Number of NaN values for each column: GARAN 196 YKBNK 216 THYAO 195 PETKM 237 TCELL 233 TTKOM 227 dtype: int64 Proportion of the NaN values in the whole data in the column which has max NaN values: 0.016285301999587713
# Drop the NaN values
ts = ts.dropna()
This code was developed by ChatGPT after much trials. It cannot be shared here because there is no specific prompt.
# Ensure that the DataFrame has a datetime index
ts.index = pd.to_datetime(ts.index)
# Extract the year and month information for each timestamp
ts['Year'] = ts.index.year
ts['Month'] = ts.index.month
# Define the order of months for sorting
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Sort the DataFrame by Year and Month
ts.sort_values(['Year', 'Month'], inplace=True)
# Create a figure with 24 subplots (2 rows, 12 columns)
fig, axes = plt.subplots(nrows=2, ncols=12, figsize=(30, 10))
fig.suptitle('Monthly Boxplots for Each Company', fontsize=16)
# Loop through each company
for i, company in enumerate(ts.columns[:-2]): # Exclude 'Year' and 'Month' columns
row = i // 12 # Calculate the row index
col = i % 12 # Calculate the column index
# Group the data by Year and Month and plot boxplots for each month
ts.boxplot(column=company, by=['Year', 'Month'], ax=axes[row, col], vert=False)
# Set titles and labels
axes[row, col].set_title(company)
axes[row, col].set_xlabel('Value')
axes[row, col].set_ylabel('')
# Remove empty subplots if there are fewer than 12 months of data
for i in range(len(ts.columns[:-2]), 24):
row = i // 12
col = i % 12
fig.delaxes(axes[row, col])
plt.tight_layout()
plt.subplots_adjust(top=5)
plt.show()
<ipython-input-10-7649127b27b3>:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ts['Year'] = ts.index.year <ipython-input-10-7649127b27b3>:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ts['Month'] = ts.index.month <ipython-input-10-7649127b27b3>:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ts.sort_values(['Year', 'Month'], inplace=True)
# Defining 1.5 IQR method to detect the outliers
def find_outliers_iqr(data, column):
Q1 = data[column].quantile(0.25)
Q3 = data[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers_IQR = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
return outliers_IQR
# Finding and visualizing the outliers for each column
columns_to_check = ["GARAN", "YKBNK", "THYAO", "PETKM", "TCELL", "TTKOM"]
# Finding the outliers for each column
for column in columns_to_check:
outliers = find_outliers_iqr(ts, column)
print(f"Number of outliers in '{column}': {outliers.shape[0]}")
print(outliers)
# Scatter plot the outliers
plt.figure(figsize=(10, 6))
plt.scatter(outliers.index, outliers[column], c='red', label='Outliers', marker='x', s=50)
# Plot the non-outlier data in blue
non_outliers = ts.drop(outliers.index)
plt.scatter(non_outliers.index, non_outliers[column], c='blue', label='Non-Outliers', marker='o', s=10)
plt.title(f'Scatter Plot of Outliers for {column}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend(loc='best')
plt.grid(True)
plt.show()
Number of outliers in 'GARAN': 324
GARAN YKBNK THYAO PETKM TCELL TTKOM \
timestamp
2015-01-19 07:30:00+00:00 8.9397 3.1577 9.36 1.5230 10.8167 6.7574
2015-01-19 07:45:00+00:00 9.0026 3.1640 9.37 1.5306 10.8894 6.8200
2015-01-19 08:00:00+00:00 8.9666 3.1577 9.30 1.5230 10.8532 6.7842
2015-01-19 08:15:00+00:00 8.9666 3.1702 9.27 1.5191 10.8894 6.8110
2015-01-19 08:30:00+00:00 8.9666 3.1640 9.25 1.5230 10.9256 6.7932
... ... ... ... ... ... ...
2015-02-03 14:30:00+00:00 9.1376 3.2207 9.05 1.5002 10.1997 6.5519
2015-02-03 14:45:00+00:00 9.0477 3.1892 8.98 1.4963 10.1997 6.5876
2015-02-03 15:00:00+00:00 9.0477 3.1577 8.92 1.4925 10.1271 6.5429
2015-02-03 15:15:00+00:00 8.9577 3.1766 8.90 1.4963 10.1997 6.5429
2015-02-03 15:30:00+00:00 8.9577 3.1829 8.89 1.4887 10.1997 6.5876
Year Month
timestamp
2015-01-19 07:30:00+00:00 2015 1
2015-01-19 07:45:00+00:00 2015 1
2015-01-19 08:00:00+00:00 2015 1
2015-01-19 08:15:00+00:00 2015 1
2015-01-19 08:30:00+00:00 2015 1
... ... ...
2015-02-03 14:30:00+00:00 2015 2
2015-02-03 14:45:00+00:00 2015 2
2015-02-03 15:00:00+00:00 2015 2
2015-02-03 15:15:00+00:00 2015 2
2015-02-03 15:30:00+00:00 2015 2
[324 rows x 8 columns]
Number of outliers in 'YKBNK': 309
GARAN YKBNK THYAO PETKM TCELL TTKOM \
timestamp
2015-01-12 08:30:00+00:00 8.7685 3.2018 10.00 1.5267 10.5627 6.3641
2015-01-12 08:45:00+00:00 8.7956 3.2144 10.00 1.5345 10.5989 6.3641
2015-01-12 09:00:00+00:00 8.7956 3.2081 10.00 1.5306 10.5627 6.3730
2015-01-12 09:15:00+00:00 8.7956 3.2081 10.00 1.5345 10.5627 6.3730
2015-01-12 09:30:00+00:00 8.7866 3.2081 9.99 1.5306 10.5627 6.3730
... ... ... ... ... ... ...
2015-02-03 13:30:00+00:00 9.1376 3.2207 9.06 1.5002 10.2360 6.5697
2015-02-03 13:45:00+00:00 9.1828 3.2270 9.06 1.5002 10.2360 6.5340
2015-02-03 14:00:00+00:00 9.2277 3.2270 9.07 1.5040 10.2360 6.5519
2015-02-03 14:15:00+00:00 9.2277 3.2270 9.07 1.5002 10.2360 6.5519
2015-02-03 14:30:00+00:00 9.1376 3.2207 9.05 1.5002 10.1997 6.5519
Year Month
timestamp
2015-01-12 08:30:00+00:00 2015 1
2015-01-12 08:45:00+00:00 2015 1
2015-01-12 09:00:00+00:00 2015 1
2015-01-12 09:15:00+00:00 2015 1
2015-01-12 09:30:00+00:00 2015 1
... ... ...
2015-02-03 13:30:00+00:00 2015 2
2015-02-03 13:45:00+00:00 2015 2
2015-02-03 14:00:00+00:00 2015 2
2015-02-03 14:15:00+00:00 2015 2
2015-02-03 14:30:00+00:00 2015 2
[309 rows x 8 columns]
Number of outliers in 'THYAO': 0 Empty DataFrame Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month] Index: []
Number of outliers in 'PETKM': 0 Empty DataFrame Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month] Index: []
Number of outliers in 'TCELL': 0 Empty DataFrame Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month] Index: []
Number of outliers in 'TTKOM': 0 Empty DataFrame Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month] Index: []
# Defining 3-sigma method for detecting outliers
def find_outliers_sigma(data, column):
mean = data[column].mean()
std = data[column].std()
z_scores = abs((data[column] - mean) / std)
outliers_3sigma = data[z_scores > 3]
return outliers_3sigma
# Finding and visualizing the outliers for each column
columns_to_check = ["GARAN", "YKBNK", "THYAO", "PETKM", "TCELL", "TTKOM"]
# Initialize a figure to hold subplots
fig, axes = plt.subplots(nrows=len(columns_to_check), ncols=1, figsize=(10, 6 * len(columns_to_check)))
# Loop through each column and find outliers and create plots
for i, column in enumerate(columns_to_check):
outliers = find_outliers_sigma(ts, column)
print(f"Number of outliers in '{column}': {outliers.shape[0]}")
print(outliers)
# Scatter plot the outliers
ax = axes[i]
ax.scatter(outliers.index, outliers[column], c='red', label='Outliers', marker='x', s=50)
# Plot the non-outlier data in blue
non_outliers = ts.drop(outliers.index)
ax.scatter(non_outliers.index, non_outliers[column], c='blue', label='Non-Outliers', marker='o', s=10)
ax.set_title(f'Scatter Plot of Outliers for {column}')
ax.set_xlabel('Date')
ax.set_ylabel('Closing Price')
ax.legend(loc='best')
ax.grid(True)
plt.tight_layout()
plt.show()
Number of outliers in 'GARAN': 238
GARAN YKBNK THYAO PETKM TCELL TTKOM \
timestamp
2015-01-20 13:30:00+00:00 9.2277 3.2081 9.53 1.5191 10.9256 6.7574
2015-01-20 13:45:00+00:00 9.2277 3.2270 9.52 1.5116 10.8167 6.7395
2015-01-20 14:00:00+00:00 9.2277 3.2207 9.52 1.5116 10.8532 6.7306
2015-01-20 14:15:00+00:00 9.2277 3.2144 9.52 1.5152 10.8532 6.7127
2015-01-20 14:30:00+00:00 9.2728 3.1955 9.52 1.5152 10.8532 6.6859
... ... ... ... ... ... ...
2015-02-03 12:00:00+00:00 9.2277 3.2522 9.10 1.4925 10.2360 6.5608
2015-02-03 12:15:00+00:00 9.2277 3.2522 9.08 1.5040 10.2723 6.5697
2015-02-03 12:30:00+00:00 9.2728 3.2522 9.09 1.5040 10.2723 6.5340
2015-02-03 14:00:00+00:00 9.2277 3.2270 9.07 1.5040 10.2360 6.5519
2015-02-03 14:15:00+00:00 9.2277 3.2270 9.07 1.5002 10.2360 6.5519
Year Month
timestamp
2015-01-20 13:30:00+00:00 2015 1
2015-01-20 13:45:00+00:00 2015 1
2015-01-20 14:00:00+00:00 2015 1
2015-01-20 14:15:00+00:00 2015 1
2015-01-20 14:30:00+00:00 2015 1
... ... ...
2015-02-03 12:00:00+00:00 2015 2
2015-02-03 12:15:00+00:00 2015 2
2015-02-03 12:30:00+00:00 2015 2
2015-02-03 14:00:00+00:00 2015 2
2015-02-03 14:15:00+00:00 2015 2
[238 rows x 8 columns]
Number of outliers in 'YKBNK': 119
GARAN YKBNK THYAO PETKM TCELL TTKOM \
timestamp
2015-01-22 13:15:00+00:00 9.4528 3.2900 9.30 1.5230 11.0707 6.7663
2015-01-22 13:30:00+00:00 9.4979 3.2963 9.32 1.5191 11.0707 6.7753
2015-01-22 13:45:00+00:00 9.4979 3.2963 9.31 1.5152 11.0707 6.7663
2015-01-22 14:00:00+00:00 9.5428 3.2963 9.36 1.5191 11.0707 6.7663
2015-01-22 14:15:00+00:00 9.5428 3.3089 9.55 1.5191 11.0707 6.7574
... ... ... ... ... ... ...
2015-02-02 14:45:00+00:00 9.3177 3.2900 9.22 1.4544 10.4174 6.5787
2015-02-02 15:00:00+00:00 9.3628 3.2963 9.21 1.4506 10.4174 6.5697
2015-02-02 15:15:00+00:00 9.3628 3.3026 9.25 1.4544 10.3812 6.5876
2015-02-02 15:30:00+00:00 9.3177 3.2963 9.25 1.4506 10.3812 6.5876
2015-02-03 07:45:00+00:00 9.4078 3.2900 9.24 1.4659 10.3086 6.6323
Year Month
timestamp
2015-01-22 13:15:00+00:00 2015 1
2015-01-22 13:30:00+00:00 2015 1
2015-01-22 13:45:00+00:00 2015 1
2015-01-22 14:00:00+00:00 2015 1
2015-01-22 14:15:00+00:00 2015 1
... ... ...
2015-02-02 14:45:00+00:00 2015 2
2015-02-02 15:00:00+00:00 2015 2
2015-02-02 15:15:00+00:00 2015 2
2015-02-02 15:30:00+00:00 2015 2
2015-02-03 07:45:00+00:00 2015 2
[119 rows x 8 columns]
Number of outliers in 'THYAO': 0
Empty DataFrame
Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month]
Index: []
Number of outliers in 'PETKM': 0
Empty DataFrame
Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month]
Index: []
Number of outliers in 'TCELL': 0
Empty DataFrame
Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month]
Index: []
Number of outliers in 'TTKOM': 0
Empty DataFrame
Columns: [GARAN, YKBNK, THYAO, PETKM, TCELL, TTKOM, Year, Month]
Index: []
The following code was developed by chatgpt, but the process of obtaining the code is very long and painful, so it cannot be added here as a single prompt. It took almost 15 tries to get this code.
import pandas as pd
import matplotlib.pyplot as plt
# Assuming you already have a DataFrame called "ts" with the specified format
# Ensure that the DataFrame has a datetime index
ts.index = pd.to_datetime(ts.index)
# Extract the year and month information for each timestamp
ts['Year'] = ts.index.year
ts['Month'] = ts.index.month
# Define the order of months for sorting
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Sort the DataFrame by Year and Month
ts.sort_values(['Year', 'Month'], inplace=True)
# Initialize a figure to hold subplots
fig, axes = plt.subplots(nrows=len(ts.columns[:-2]) * 2, ncols=12, figsize=(60, 45))
# Loop through each company
for i, company in enumerate(ts.columns[:-2]): # Exclude 'Year' and 'Month' columns
for year in ts['Year'].unique():
for month_num, month_name in enumerate(months_order):
ax = axes[i * 2 + (year - ts['Year'].min()), month_num]
# Filter data for the current company, year, and month
company_data = ts[(ts['Year'] == year) & (ts['Month'] == (month_num + 1))]
outliers = find_outliers_sigma(company_data, company)
# Scatter plot the outliers
ax.scatter(outliers.index, outliers[company], c='red', label='Outliers', marker='x', s=50)
# Plot the non-outliers data in blue
non_outliers = company_data.drop(outliers.index)
ax.scatter(non_outliers.index, non_outliers[company], c='blue', label='Non-Outliers', marker='o', s=10)
ax.set_title(f'Scatter Plot of Outliers for {company} ({month_name} {year})')
ax.set_xlabel('Date')
ax.set_ylabel('Closing Price')
# ax.legend(loc='best')
ax.grid(True)
plt.tight_layout()
plt.subplots_adjust()
plt.show()
Since both Garanti and Yapı Kredi are banking companies, the distribution of their data is similar on the selected interval. So, outlier analysis are constructed based on this similarity.
ts["GARAN"].plot(title="Time Series plot for GARAN", c="g")
<Axes: title={'center': 'Time Series plot for GARAN'}, xlabel='timestamp'>
outliers = find_outliers_iqr(ts, "GARAN")
plt.scatter(outliers.index, outliers["GARAN"], c='red', label='Outliers', marker='x', s=10)
plt.xticks(rotation=90)
plt.title("Outliers in GARAN data")
nout_garan = outliers.shape[0]
ts["YKBNK"].plot(title="Time Series plot for YKBNK", c="navy")
<Axes: title={'center': 'Time Series plot for YKBNK'}, xlabel='timestamp'>
outliers = find_outliers_iqr(ts, "YKBNK")
plt.scatter(outliers.index, outliers["YKBNK"], c='red', label='Outliers', marker='x', s=10)
plt.xticks(rotation=90)
plt.title("Outliers in YKBNK data")
nout_ykbnk = outliers.shape[0]
print(f"Number of outliers in GARAN data: {nout_garan}.")
print(f"Number of outliers in YKBNK data: {nout_ykbnk}.")
print("Yapı Kredi has less outliers, more robust patterns.")
Number of outliers in GARAN data: 324. Number of outliers in YKBNK data: 309. Yapı Kredi has less outliers, more robust patterns.
Codes below are generated by ChatGPT.
ChatGPT Prompt: I believe you understand my objective and characteristics of my time series. I need you to write a code to construct a new df based on resample data daily and log[(Y_t+1)/Y_t] (log return).
# Calculate the daily log returns
daily_returns = ts.pct_change().apply(lambda x: (1 + x).apply(pd.np.log))
# Drop the first row (NaN value) as there is no previous day's data
daily_returns = daily_returns.dropna()
# You can rename the columns to indicate that these are log returns
daily_returns.columns = [f'{col}_Log_Return' for col in daily_returns.columns]
# If you want to resample the data daily, you can do it like this:
daily_returns_resampled = daily_returns.resample('D').mean() # Resample to daily frequency, taking the mean for each day
# If you want to keep only the daily log returns and remove weekends (non-trading days), you can do this:
daily_returns_resampled = daily_returns_resampled[daily_returns_resampled.index.dayofweek < 5]
<ipython-input-22-9dbbcb130908>:2: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead. daily_returns = ts.pct_change().apply(lambda x: (1 + x).apply(pd.np.log))
daily_returns_resampled.isna().sum()
GARAN_Log_Return 19 YKBNK_Log_Return 19 THYAO_Log_Return 19 PETKM_Log_Return 19 TCELL_Log_Return 19 TTKOM_Log_Return 19 Year_Log_Return 19 Month_Log_Return 19 dtype: int64
daily_returns_resampled[["GARAN_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["YKBNK_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["THYAO_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["PETKM_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["TCELL_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["TTKOM_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
daily_returns_resampled[["Month_Log_Return"]].plot(figsize=(12,6))
<Axes: xlabel='timestamp'>
# Finding and visualizing the outliers for each column
columns_to_check = ["GARAN_Log_Return", "YKBNK_Log_Return", "THYAO_Log_Return", "PETKM_Log_Return", "TCELL_Log_Return", "TTKOM_Log_Return"]
# Finding the outliers for each column
for column in columns_to_check:
outliers_resampled = find_outliers_iqr(daily_returns_resampled, column)
print(f"Number of outliers in '{column}': {outliers_resampled.shape[0]}")
print(outliers_resampled)
# Scatter plot the outliers
plt.figure(figsize=(10, 6))
plt.scatter(outliers_resampled.index, outliers_resampled[column], c='red', label='Outliers', marker='x', s=50)
# Plot the non-outlier data in blue
non_outliers = daily_returns_resampled.drop(outliers_resampled.index)
plt.scatter(non_outliers.index, non_outliers[column], c='blue', label='Non-Outliers', marker='o', s=10)
plt.title(f'Scatter Plot of Outliers for {column}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend(loc='best')
plt.grid(True)
plt.show()
Number of outliers in 'GARAN_Log_Return': 9
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-02-04 00:00:00+00:00 -0.002264 -0.001193
2015-03-19 00:00:00+00:00 0.002143 0.001315
2015-06-08 00:00:00+00:00 -0.002876 -0.003149
2015-10-28 00:00:00+00:00 -0.002832 -0.003244
2015-11-02 00:00:00+00:00 0.003320 0.003576
2015-12-11 00:00:00+00:00 -0.001800 -0.001202
2015-12-15 00:00:00+00:00 0.001943 0.001223
2016-07-18 00:00:00+00:00 -0.003066 -0.002929
2016-09-26 00:00:00+00:00 -0.001921 -0.001774
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-02-04 00:00:00+00:00 0.000414 -0.000285
2015-03-19 00:00:00+00:00 0.001058 0.000204
2015-06-08 00:00:00+00:00 -0.002467 -0.001100
2015-10-28 00:00:00+00:00 -0.002731 -0.001426
2015-11-02 00:00:00+00:00 0.002455 0.001931
2015-12-11 00:00:00+00:00 -0.001746 -0.001241
2015-12-15 00:00:00+00:00 0.002044 0.000979
2016-07-18 00:00:00+00:00 -0.004483 -0.001409
2016-09-26 00:00:00+00:00 -0.001402 -0.000639
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-02-04 00:00:00+00:00 -0.000132 -0.000101
2015-03-19 00:00:00+00:00 0.000725 -0.000306
2015-06-08 00:00:00+00:00 -0.001544 -0.001129
2015-10-28 00:00:00+00:00 0.000987 -0.003343
2015-11-02 00:00:00+00:00 0.000633 0.001102
2015-12-11 00:00:00+00:00 -0.000779 -0.000670
2015-12-15 00:00:00+00:00 0.001460 0.000681
2016-07-18 00:00:00+00:00 -0.001183 -0.001452
2016-09-26 00:00:00+00:00 -0.000268 -0.000516
Year_Log_Return Month_Log_Return
timestamp
2015-02-04 00:00:00+00:00 0.0 0.00000
2015-03-19 00:00:00+00:00 0.0 0.00000
2015-06-08 00:00:00+00:00 0.0 0.00000
2015-10-28 00:00:00+00:00 0.0 0.00000
2015-11-02 00:00:00+00:00 0.0 0.00353
2015-12-11 00:00:00+00:00 0.0 0.00000
2015-12-15 00:00:00+00:00 0.0 0.00000
2016-07-18 00:00:00+00:00 0.0 0.00000
2016-09-26 00:00:00+00:00 0.0 0.00000
Number of outliers in 'YKBNK_Log_Return': 12
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.001638 -0.001673
2015-06-08 00:00:00+00:00 -0.002876 -0.003149
2015-08-25 00:00:00+00:00 0.001493 0.001748
2015-10-28 00:00:00+00:00 -0.002832 -0.003244
2015-11-02 00:00:00+00:00 0.003320 0.003576
2015-11-24 00:00:00+00:00 -0.001463 -0.001776
2016-03-30 00:00:00+00:00 0.000875 0.001731
2016-05-20 00:00:00+00:00 -0.000907 -0.001916
2016-06-24 00:00:00+00:00 -0.001557 -0.001650
2016-07-18 00:00:00+00:00 -0.003066 -0.002929
2016-07-21 00:00:00+00:00 -0.001303 -0.002592
2016-09-26 00:00:00+00:00 -0.001921 -0.001774
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.001193 -0.001789
2015-06-08 00:00:00+00:00 -0.002467 -0.001100
2015-08-25 00:00:00+00:00 0.001357 0.000761
2015-10-28 00:00:00+00:00 -0.002731 -0.001426
2015-11-02 00:00:00+00:00 0.002455 0.001931
2015-11-24 00:00:00+00:00 -0.002748 -0.000854
2016-03-30 00:00:00+00:00 0.000606 0.000603
2016-05-20 00:00:00+00:00 -0.000384 -0.000545
2016-06-24 00:00:00+00:00 -0.000928 -0.001419
2016-07-18 00:00:00+00:00 -0.004483 -0.001409
2016-07-21 00:00:00+00:00 -0.001210 -0.000917
2016-09-26 00:00:00+00:00 -0.001402 -0.000639
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.000736 -0.001096
2015-06-08 00:00:00+00:00 -0.001544 -0.001129
2015-08-25 00:00:00+00:00 0.000504 0.000677
2015-10-28 00:00:00+00:00 0.000987 -0.003343
2015-11-02 00:00:00+00:00 0.000633 0.001102
2015-11-24 00:00:00+00:00 -0.002360 -0.001878
2016-03-30 00:00:00+00:00 0.000202 0.000402
2016-05-20 00:00:00+00:00 -0.000062 0.000111
2016-06-24 00:00:00+00:00 -0.000545 -0.001007
2016-07-18 00:00:00+00:00 -0.001183 -0.001452
2016-07-21 00:00:00+00:00 -0.001155 -0.001352
2016-09-26 00:00:00+00:00 -0.000268 -0.000516
Year_Log_Return Month_Log_Return
timestamp
2015-03-10 00:00:00+00:00 0.0 0.00000
2015-06-08 00:00:00+00:00 0.0 0.00000
2015-08-25 00:00:00+00:00 0.0 0.00000
2015-10-28 00:00:00+00:00 0.0 0.00000
2015-11-02 00:00:00+00:00 0.0 0.00353
2015-11-24 00:00:00+00:00 0.0 0.00000
2016-03-30 00:00:00+00:00 0.0 0.00000
2016-05-20 00:00:00+00:00 0.0 0.00000
2016-06-24 00:00:00+00:00 0.0 0.00000
2016-07-18 00:00:00+00:00 0.0 0.00000
2016-07-21 00:00:00+00:00 0.0 0.00000
2016-09-26 00:00:00+00:00 0.0 0.00000
Number of outliers in 'THYAO_Log_Return': 14
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-01-05 00:00:00+00:00 0.000547 0.000152
2015-01-20 00:00:00+00:00 0.001453 0.000440
2015-06-08 00:00:00+00:00 -0.002876 -0.003149
2015-07-23 00:00:00+00:00 -0.001154 -0.000999
2015-08-24 00:00:00+00:00 -0.000934 -0.001102
2015-10-28 00:00:00+00:00 -0.002832 -0.003244
2015-11-02 00:00:00+00:00 0.003320 0.003576
2015-11-24 00:00:00+00:00 -0.001463 -0.001776
2015-12-10 00:00:00+00:00 -0.001355 -0.001063
2015-12-11 00:00:00+00:00 -0.001800 -0.001202
2015-12-15 00:00:00+00:00 0.001943 0.001223
2016-04-20 00:00:00+00:00 -0.000266 -0.000620
2016-07-18 00:00:00+00:00 -0.003066 -0.002929
2016-08-08 00:00:00+00:00 0.001120 0.000851
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-01-05 00:00:00+00:00 0.001785 0.000374
2015-01-20 00:00:00+00:00 0.002583 0.000088
2015-06-08 00:00:00+00:00 -0.002467 -0.001100
2015-07-23 00:00:00+00:00 -0.001803 -0.001182
2015-08-24 00:00:00+00:00 -0.002040 -0.000945
2015-10-28 00:00:00+00:00 -0.002731 -0.001426
2015-11-02 00:00:00+00:00 0.002455 0.001931
2015-11-24 00:00:00+00:00 -0.002748 -0.000854
2015-12-10 00:00:00+00:00 -0.001747 0.000362
2015-12-11 00:00:00+00:00 -0.001746 -0.001241
2015-12-15 00:00:00+00:00 0.002044 0.000979
2016-04-20 00:00:00+00:00 -0.002347 -0.000672
2016-07-18 00:00:00+00:00 -0.004483 -0.001409
2016-08-08 00:00:00+00:00 0.001948 0.000399
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-01-05 00:00:00+00:00 0.000651 1.539144e-04
2015-01-20 00:00:00+00:00 0.000123 -4.421335e-04
2015-06-08 00:00:00+00:00 -0.001544 -1.128601e-03
2015-07-23 00:00:00+00:00 -0.001195 -1.480846e-03
2015-08-24 00:00:00+00:00 -0.001815 -1.581186e-03
2015-10-28 00:00:00+00:00 0.000987 -3.342908e-03
2015-11-02 00:00:00+00:00 0.000633 1.102331e-03
2015-11-24 00:00:00+00:00 -0.002360 -1.878049e-03
2015-12-10 00:00:00+00:00 -0.001659 -8.918632e-04
2015-12-11 00:00:00+00:00 -0.000779 -6.697113e-04
2015-12-15 00:00:00+00:00 0.001460 6.809071e-04
2016-04-20 00:00:00+00:00 0.000419 -2.312965e-19
2016-07-18 00:00:00+00:00 -0.001183 -1.451759e-03
2016-08-08 00:00:00+00:00 0.000584 2.214848e-04
Year_Log_Return Month_Log_Return
timestamp
2015-01-05 00:00:00+00:00 0.0 0.00000
2015-01-20 00:00:00+00:00 0.0 0.00000
2015-06-08 00:00:00+00:00 0.0 0.00000
2015-07-23 00:00:00+00:00 0.0 0.00000
2015-08-24 00:00:00+00:00 0.0 0.00000
2015-10-28 00:00:00+00:00 0.0 0.00000
2015-11-02 00:00:00+00:00 0.0 0.00353
2015-11-24 00:00:00+00:00 0.0 0.00000
2015-12-10 00:00:00+00:00 0.0 0.00000
2015-12-11 00:00:00+00:00 0.0 0.00000
2015-12-15 00:00:00+00:00 0.0 0.00000
2016-04-20 00:00:00+00:00 0.0 0.00000
2016-07-18 00:00:00+00:00 0.0 0.00000
2016-08-08 00:00:00+00:00 0.0 0.00000
Number of outliers in 'PETKM_Log_Return': 12
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.001638 -0.001673
2015-03-13 00:00:00+00:00 -0.001179 -0.000543
2015-04-30 00:00:00+00:00 -0.001199 -0.001217
2015-06-24 00:00:00+00:00 0.000559 0.000546
2015-10-28 00:00:00+00:00 -0.002832 -0.003244
2015-11-02 00:00:00+00:00 0.003320 0.003576
2016-02-18 00:00:00+00:00 0.000093 -0.000442
2016-03-29 00:00:00+00:00 -0.000336 0.000340
2016-06-24 00:00:00+00:00 -0.001557 -0.001650
2016-07-18 00:00:00+00:00 -0.003066 -0.002929
2016-11-04 00:00:00+00:00 -0.001655 -0.001472
2016-12-01 00:00:00+00:00 -0.000401 -0.000627
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.001193 -0.001789
2015-03-13 00:00:00+00:00 -0.001078 -0.001487
2015-04-30 00:00:00+00:00 -0.001552 -0.001351
2015-06-24 00:00:00+00:00 0.000656 0.001407
2015-10-28 00:00:00+00:00 -0.002731 -0.001426
2015-11-02 00:00:00+00:00 0.002455 0.001931
2016-02-18 00:00:00+00:00 0.000485 0.001484
2016-03-29 00:00:00+00:00 0.000044 0.001424
2016-06-24 00:00:00+00:00 -0.000928 -0.001419
2016-07-18 00:00:00+00:00 -0.004483 -0.001409
2016-11-04 00:00:00+00:00 -0.001664 -0.001654
2016-12-01 00:00:00+00:00 -0.001230 -0.002008
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-03-10 00:00:00+00:00 -0.000736 -0.001096
2015-03-13 00:00:00+00:00 -0.000147 -0.001910
2015-04-30 00:00:00+00:00 -0.001524 -0.000250
2015-06-24 00:00:00+00:00 0.000147 0.000205
2015-10-28 00:00:00+00:00 0.000987 -0.003343
2015-11-02 00:00:00+00:00 0.000633 0.001102
2016-02-18 00:00:00+00:00 0.000389 0.000177
2016-03-29 00:00:00+00:00 -0.000547 -0.000352
2016-06-24 00:00:00+00:00 -0.000545 -0.001007
2016-07-18 00:00:00+00:00 -0.001183 -0.001452
2016-11-04 00:00:00+00:00 -0.000311 -0.000830
2016-12-01 00:00:00+00:00 -0.000504 -0.002611
Year_Log_Return Month_Log_Return
timestamp
2015-03-10 00:00:00+00:00 0.0 0.000000
2015-03-13 00:00:00+00:00 0.0 0.000000
2015-04-30 00:00:00+00:00 0.0 0.000000
2015-06-24 00:00:00+00:00 0.0 0.000000
2015-10-28 00:00:00+00:00 0.0 0.000000
2015-11-02 00:00:00+00:00 0.0 0.003530
2016-02-18 00:00:00+00:00 0.0 0.000000
2016-03-29 00:00:00+00:00 0.0 0.000000
2016-06-24 00:00:00+00:00 0.0 0.000000
2016-07-18 00:00:00+00:00 0.0 0.000000
2016-11-04 00:00:00+00:00 0.0 0.000000
2016-12-01 00:00:00+00:00 0.0 0.002807
Number of outliers in 'TCELL_Log_Return': 16
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-01-13 00:00:00+00:00 0.000077 0.000073
2015-01-23 00:00:00+00:00 -0.000174 -0.000350
2015-02-11 00:00:00+00:00 -0.000617 0.000079
2015-02-13 00:00:00+00:00 -0.000319 0.000312
2015-03-25 00:00:00+00:00 0.000126 0.000435
2015-04-30 00:00:00+00:00 -0.001199 -0.001217
2015-06-03 00:00:00+00:00 0.000832 0.001181
2015-06-08 00:00:00+00:00 -0.002876 -0.003149
2015-08-24 00:00:00+00:00 -0.000934 -0.001102
2015-11-24 00:00:00+00:00 -0.001463 -0.001776
2015-11-26 00:00:00+00:00 -0.001356 -0.001301
2015-12-10 00:00:00+00:00 -0.001355 -0.001063
2015-12-15 00:00:00+00:00 0.001943 0.001223
2015-12-18 00:00:00+00:00 -0.000137 -0.000398
2016-05-04 00:00:00+00:00 -0.000871 -0.000488
2016-07-14 00:00:00+00:00 0.000604 0.000241
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-01-13 00:00:00+00:00 -0.000150 0.000649
2015-01-23 00:00:00+00:00 -0.000628 -0.000095
2015-02-11 00:00:00+00:00 0.000165 0.000296
2015-02-13 00:00:00+00:00 -0.000725 -0.000487
2015-03-25 00:00:00+00:00 -0.000085 0.000105
2015-04-30 00:00:00+00:00 -0.001552 -0.001351
2015-06-03 00:00:00+00:00 0.000703 0.000870
2015-06-08 00:00:00+00:00 -0.002467 -0.001100
2015-08-24 00:00:00+00:00 -0.002040 -0.000945
2015-11-24 00:00:00+00:00 -0.002748 -0.000854
2015-11-26 00:00:00+00:00 -0.001039 -0.000848
2015-12-10 00:00:00+00:00 -0.001747 0.000362
2015-12-15 00:00:00+00:00 0.002044 0.000979
2015-12-18 00:00:00+00:00 -0.000458 -0.000429
2016-05-04 00:00:00+00:00 0.000451 0.001079
2016-07-14 00:00:00+00:00 0.000166 0.000492
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-01-13 00:00:00+00:00 0.001393 0.000104
2015-01-23 00:00:00+00:00 -0.001351 0.000679
2015-02-11 00:00:00+00:00 0.001801 0.000214
2015-02-13 00:00:00+00:00 -0.001664 -0.000051
2015-03-25 00:00:00+00:00 0.001942 0.000051
2015-04-30 00:00:00+00:00 -0.001524 -0.000250
2015-06-03 00:00:00+00:00 0.001393 0.000846
2015-06-08 00:00:00+00:00 -0.001544 -0.001129
2015-08-24 00:00:00+00:00 -0.001815 -0.001581
2015-11-24 00:00:00+00:00 -0.002360 -0.001878
2015-11-26 00:00:00+00:00 -0.001305 -0.000620
2015-12-10 00:00:00+00:00 -0.001659 -0.000892
2015-12-15 00:00:00+00:00 0.001460 0.000681
2015-12-18 00:00:00+00:00 -0.001435 -0.000977
2016-05-04 00:00:00+00:00 -0.002072 -0.000630
2016-07-14 00:00:00+00:00 0.001477 0.000566
Year_Log_Return Month_Log_Return
timestamp
2015-01-13 00:00:00+00:00 0.0 0.0
2015-01-23 00:00:00+00:00 0.0 0.0
2015-02-11 00:00:00+00:00 0.0 0.0
2015-02-13 00:00:00+00:00 0.0 0.0
2015-03-25 00:00:00+00:00 0.0 0.0
2015-04-30 00:00:00+00:00 0.0 0.0
2015-06-03 00:00:00+00:00 0.0 0.0
2015-06-08 00:00:00+00:00 0.0 0.0
2015-08-24 00:00:00+00:00 0.0 0.0
2015-11-24 00:00:00+00:00 0.0 0.0
2015-11-26 00:00:00+00:00 0.0 0.0
2015-12-10 00:00:00+00:00 0.0 0.0
2015-12-15 00:00:00+00:00 0.0 0.0
2015-12-18 00:00:00+00:00 0.0 0.0
2016-05-04 00:00:00+00:00 0.0 0.0
2016-07-14 00:00:00+00:00 0.0 0.0
Number of outliers in 'TTKOM_Log_Return': 12
GARAN_Log_Return YKBNK_Log_Return \
timestamp
2015-02-12 00:00:00+00:00 0.001622 0.000874
2015-03-13 00:00:00+00:00 -0.001179 -0.000543
2015-04-22 00:00:00+00:00 -0.000777 -0.000523
2015-04-24 00:00:00+00:00 0.000819 0.000611
2015-08-24 00:00:00+00:00 -0.000934 -0.001102
2015-10-07 00:00:00+00:00 0.001574 0.001347
2015-10-23 00:00:00+00:00 0.000568 0.000171
2015-10-28 00:00:00+00:00 -0.002832 -0.003244
2015-11-24 00:00:00+00:00 -0.001463 -0.001776
2016-02-04 00:00:00+00:00 0.000587 0.000956
2016-04-21 00:00:00+00:00 -0.000345 0.000312
2016-12-01 00:00:00+00:00 -0.000401 -0.000627
THYAO_Log_Return PETKM_Log_Return \
timestamp
2015-02-12 00:00:00+00:00 0.001217 7.807799e-04
2015-03-13 00:00:00+00:00 -0.001078 -1.487272e-03
2015-04-22 00:00:00+00:00 -0.000242 -1.030740e-04
2015-04-24 00:00:00+00:00 0.000482 5.862560e-04
2015-08-24 00:00:00+00:00 -0.002040 -9.454449e-04
2015-10-07 00:00:00+00:00 0.000219 -2.216591e-18
2015-10-23 00:00:00+00:00 0.000511 5.749254e-04
2015-10-28 00:00:00+00:00 -0.002731 -1.426491e-03
2015-11-24 00:00:00+00:00 -0.002748 -8.539887e-04
2016-02-04 00:00:00+00:00 0.000322 -4.770144e-04
2016-04-21 00:00:00+00:00 -0.000384 8.454406e-05
2016-12-01 00:00:00+00:00 -0.001230 -2.007516e-03
TCELL_Log_Return TTKOM_Log_Return \
timestamp
2015-02-12 00:00:00+00:00 0.000125 0.001619
2015-03-13 00:00:00+00:00 -0.000147 -0.001910
2015-04-22 00:00:00+00:00 0.000158 0.001513
2015-04-24 00:00:00+00:00 0.001092 0.001602
2015-08-24 00:00:00+00:00 -0.001815 -0.001581
2015-10-07 00:00:00+00:00 0.000840 0.001478
2015-10-23 00:00:00+00:00 0.000813 0.001495
2015-10-28 00:00:00+00:00 0.000987 -0.003343
2015-11-24 00:00:00+00:00 -0.002360 -0.001878
2016-02-04 00:00:00+00:00 0.001043 0.001682
2016-04-21 00:00:00+00:00 -0.000758 -0.001911
2016-12-01 00:00:00+00:00 -0.000504 -0.002611
Year_Log_Return Month_Log_Return
timestamp
2015-02-12 00:00:00+00:00 0.0 0.000000
2015-03-13 00:00:00+00:00 0.0 0.000000
2015-04-22 00:00:00+00:00 0.0 0.000000
2015-04-24 00:00:00+00:00 0.0 0.000000
2015-08-24 00:00:00+00:00 0.0 0.000000
2015-10-07 00:00:00+00:00 0.0 0.000000
2015-10-23 00:00:00+00:00 0.0 0.000000
2015-10-28 00:00:00+00:00 0.0 0.000000
2015-11-24 00:00:00+00:00 0.0 0.000000
2016-02-04 00:00:00+00:00 0.0 0.000000
2016-04-21 00:00:00+00:00 0.0 0.000000
2016-12-01 00:00:00+00:00 0.0 0.002807
ChatGPT Prompt: Write a python code to show the correlation matrix and volatility. Explain them in detail. Give general informations about them.
# Calculate the correlation matrix
correlation_matrix = daily_returns_resampled.corr()
# Plot the correlation matrix as a heatmap
plt.figure(figsize=(10, 8))
plt.imshow(correlation_matrix, cmap='coolwarm', interpolation='nearest')
plt.colorbar()
plt.title('Correlation Matrix of Daily Log Returns')
plt.xticks(range(len(correlation_matrix.columns)), correlation_matrix.columns, rotation=90)
plt.yticks(range(len(correlation_matrix.columns)), correlation_matrix.columns)
plt.show()
# Calculate and plot volatility (standard deviation) for each company
volatility = daily_returns_resampled.std()
plt.figure(figsize=(10, 5))
volatility.plot(kind='bar', color='skyblue')
plt.title('Volatility (Standard Deviation) of Daily Log Returns')
plt.xlabel('Company')
plt.ylabel('Volatility')
plt.xticks(rotation=45)
plt.show()
Correlation Analysis:
Volatility Analysis:
d_ts = daily_returns_resampled.drop(["Month_Log_Return", "Year_Log_Return"], axis=1)
d_ts.head()
| GARAN_Log_Return | YKBNK_Log_Return | THYAO_Log_Return | PETKM_Log_Return | TCELL_Log_Return | TTKOM_Log_Return | |
|---|---|---|---|---|---|---|
| timestamp | ||||||
| 2015-01-02 00:00:00+00:00 | -0.000244 | 5.771292e-18 | 2.381527e-04 | -0.000388 | -6.756372e-04 | 0.000053 |
| 2015-01-05 00:00:00+00:00 | 0.000547 | 1.519949e-04 | 1.785263e-03 | 0.000374 | 6.506136e-04 | 0.000154 |
| 2015-01-06 00:00:00+00:00 | 0.000232 | 2.238614e-04 | -2.145917e-17 | -0.000190 | -1.289262e-04 | -0.000051 |
| 2015-01-07 00:00:00+00:00 | 0.000116 | 1.504634e-04 | -5.487069e-04 | 0.000190 | -1.037622e-17 | -0.000517 |
| 2015-01-08 00:00:00+00:00 | 0.000458 | 4.430841e-04 | 1.838070e-04 | 0.000641 | -7.597446e-18 | 0.000104 |
ChatGPT Prompt: how to fill nan values by the mean of the previous 2 and next 2 data points?
# Define the number of previous and next data points to consider
n = 2
# Fill NaN values with the mean of the surrounding data points
d_ts = d_ts.fillna(d_ts.rolling(window=2 * n + 1, min_periods=1).mean())
from statsmodels.tsa.stattools import adfuller
# Define a significance level (e.g., 0.05)
alpha = 0.05
# Create a DataFrame to store ADF test results
adf_results = pd.DataFrame(columns=['ADF Statistic', 'p-value', 'Lags Used', 'Number of Observations', 'Critical Values'])
# Loop through each column
for column in d_ts.columns:
result = adfuller(d_ts[column], autolag='AIC')
adf_results.loc[column] = [result[0], result[1], result[2], result[3], result[4]]
# Print ADF test results
print("ADF Test Results:")
print(adf_results)
# Interpret the results
for column in adf_results.index:
adf_statistic = adf_results.loc[column]['ADF Statistic']
p_value = adf_results.loc[column]['p-value']
critical_values = adf_results.loc[column]['Critical Values']
print(f"\nColumn: {column}")
print(f"ADF Statistic: {adf_statistic}")
print(f"p-value: {p_value}")
print("Critical Values:")
for key, value in critical_values.items():
print(f" {key}: {value}")
if p_value <= alpha:
print(f"Conclusion: Reject the null hypothesis (stationary)")
else:
print(f"Conclusion: Fail to reject the null hypothesis (non-stationary)")
ADF Test Results:
ADF Statistic p-value Lags Used \
GARAN_Log_Return -17.500708 4.387011e-30 1
YKBNK_Log_Return -16.991184 8.866652e-30 1
THYAO_Log_Return -17.416868 4.850605e-30 1
PETKM_Log_Return -17.479997 4.494702e-30 1
TCELL_Log_Return -22.327185 0.000000e+00 0
TTKOM_Log_Return -17.535836 4.213791e-30 1
Number of Observations \
GARAN_Log_Return 519
YKBNK_Log_Return 519
THYAO_Log_Return 519
PETKM_Log_Return 519
TCELL_Log_Return 520
TTKOM_Log_Return 519
Critical Values
GARAN_Log_Return {'1%': -3.4430126933746767, '5%': -2.867124983...
YKBNK_Log_Return {'1%': -3.4430126933746767, '5%': -2.867124983...
THYAO_Log_Return {'1%': -3.4430126933746767, '5%': -2.867124983...
PETKM_Log_Return {'1%': -3.4430126933746767, '5%': -2.867124983...
TCELL_Log_Return {'1%': -3.4429882202506255, '5%': -2.867114212...
TTKOM_Log_Return {'1%': -3.4430126933746767, '5%': -2.867124983...
Column: GARAN_Log_Return
ADF Statistic: -17.50070769201032
p-value: 4.387010940838542e-30
Critical Values:
1%: -3.4430126933746767
5%: -2.8671249839002764
10%: -2.569744590233924
Conclusion: Reject the null hypothesis (stationary)
Column: YKBNK_Log_Return
ADF Statistic: -16.991183829755364
p-value: 8.866651577983549e-30
Critical Values:
1%: -3.4430126933746767
5%: -2.8671249839002764
10%: -2.569744590233924
Conclusion: Reject the null hypothesis (stationary)
Column: THYAO_Log_Return
ADF Statistic: -17.416867730851532
p-value: 4.8506045119449034e-30
Critical Values:
1%: -3.4430126933746767
5%: -2.8671249839002764
10%: -2.569744590233924
Conclusion: Reject the null hypothesis (stationary)
Column: PETKM_Log_Return
ADF Statistic: -17.479996597162767
p-value: 4.494702306595752e-30
Critical Values:
1%: -3.4430126933746767
5%: -2.8671249839002764
10%: -2.569744590233924
Conclusion: Reject the null hypothesis (stationary)
Column: TCELL_Log_Return
ADF Statistic: -22.32718455529764
p-value: 0.0
Critical Values:
1%: -3.4429882202506255
5%: -2.8671142122781066
10%: -2.569738849852071
Conclusion: Reject the null hypothesis (stationary)
Column: TTKOM_Log_Return
ADF Statistic: -17.53583590673761
p-value: 4.2137911440693824e-30
Critical Values:
1%: -3.4430126933746767
5%: -2.8671249839002764
10%: -2.569744590233924
Conclusion: Reject the null hypothesis (stationary)
adf_results
| ADF Statistic | p-value | Lags Used | Number of Observations | Critical Values | |
|---|---|---|---|---|---|
| GARAN_Log_Return | -17.500708 | 4.387011e-30 | 1 | 519 | {'1%': -3.4430126933746767, '5%': -2.867124983... |
| YKBNK_Log_Return | -16.991184 | 8.866652e-30 | 1 | 519 | {'1%': -3.4430126933746767, '5%': -2.867124983... |
| THYAO_Log_Return | -17.416868 | 4.850605e-30 | 1 | 519 | {'1%': -3.4430126933746767, '5%': -2.867124983... |
| PETKM_Log_Return | -17.479997 | 4.494702e-30 | 1 | 519 | {'1%': -3.4430126933746767, '5%': -2.867124983... |
| TCELL_Log_Return | -22.327185 | 0.000000e+00 | 0 | 520 | {'1%': -3.4429882202506255, '5%': -2.867114212... |
| TTKOM_Log_Return | -17.535836 | 4.213791e-30 | 1 | 519 | {'1%': -3.4430126933746767, '5%': -2.867124983... |
ChatGPT Prompt: Write a code for Value at Risk calculations and explain in detail what is VaR? Give general informations about VaR.
from scipy.stats import norm
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'GARAN_Log_Return' column as a NumPy array
returns = d_ts['GARAN_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0014
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'YKBNK_Log_Return' column as a NumPy array
returns = d_ts['YKBNK_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0014
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'THYAO_Log_Return' column as a NumPy array
returns = d_ts['THYAO_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0015
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'PETKM_Log_Return' column as a NumPy array
returns = d_ts['PETKM_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0011
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'TCELL_Log_Return' column as a NumPy array
returns = d_ts['TCELL_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0011
# Define the confidence level (e.g., 95% confidence level)
confidence_level = 0.95
# Define the time horizon (e.g., 1 day)
time_horizon = 1
# Extract the 'TTKOM_Log_Return' column as a NumPy array
returns = d_ts['TTKOM_Log_Return'].values
# Calculate the mean and standard deviation of returns
mean_return = np.mean(returns)
std_dev = np.std(returns)
# Calculate the Z-score for the given confidence level
z_score = norm.ppf(1 - (1 - confidence_level) / 2)
# Calculate the VaR using the parametric method (normal distribution)
var_parametric = -mean_return * time_horizon + z_score * std_dev * np.sqrt(time_horizon)
# Print the VaR
print(f"Parametric VaR at {confidence_level * 100}% confidence for {time_horizon} day(s): {var_parametric:.4f}")
Parametric VaR at 95.0% confidence for 1 day(s): 0.0012
Complex codes are taken from ChatGPT (GPT-3.5 turbo). Since one of our group members (Burak) works as a Data Scientist, some of the codes are taken from the previous studies/projects. We don't want to use ChatGPT for all of the work, documentations of the libraries helped a lot.
https://chat.openai.com/?model=text-davinci-002-render-sha https://pandas.pydata.org/docs/ https://matplotlib.org/stable/index.html https://www.statsmodels.org/stable/index.html https://www.yapikredi.com.tr/yapi-kredi-hakkinda/piyasa-bulteni/ https://www.garantibbvayatirim.com.tr/arastirma-raporlari/g%C3%BCnl%C3%BCk-b%C3%BCltenler